Text retrieval from early printed books
نویسندگان
چکیده
منابع مشابه
Transfer Learning for OCRopus Model Training on Early Printed Books
A method is presented that significantly reduces the character error rates for OCR text obtained from OCRopus models trained on early printed books when only small amounts of diplomatic transcriptions are available. This is achieved by building from already existing models during training instead of starting from scratch. To overcome the discrepancies between the set of characters of the pretra...
متن کاملA Catalogue of Printed Books in the Wellcome Historical Medical Library. II—Books printed from 1641 to 1850 A—E
A Catalogue ofPrinted Books in the Wellcome Historical Medical Library. II-Books printed from 1641 to 1850 A-E, London, The Wellcome Historical Medical Library, 1966, pp. xi, 540, £10 10s. The second part of the Wellcome Catalogue of Printed Books, of which this volume is the first instalment, covers a period much less fully explored by bibliographers and historians than the first volume which ...
متن کاملPrinted Books to 1640 JULIAN ROBERTS
BIBLIOGRAPHY IS A TECHNIQUE PROPER to librarians. There are many outstanding exceptions to this resoundingly simple statement, but it nevertheless remains true that the librarian is the principal interpreter and beneficiary of the evidence which books, through their physical features, offer about themselves. One of the librarian's most elementary acts, that of cataloging, is bibliographical in ...
متن کاملThe Labeled Segmentation of Printed Books
We introduce the task of book structure labeling: segmenting and assigning a fixed category (such as TABLE OF CONTENTS, PREFACE, INDEX) to the document structure of printed books. We manually annotate the page-level structural categories for a large dataset totaling 294,816 pages in 1,055 books evenly sampled from 1750– 1922, and present empirical results comparing the performance of several cl...
متن کاملImproving OCR Accuracy on Early Printed Books by combining Pretraining, Voting, and Active Learning
We combine three methods which significantly improve the OCR accuracy of OCR models trained on early printed books: (1) The pretraining method utilizes the information stored in already existing models trained on a variety of typesets (mixed models) instead of starting the training from scratch. (2) Performing cross fold training on a single set of ground truth data (line images and their trans...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Document Analysis and Recognition (IJDAR)
سال: 2011
ISSN: 1433-2833,1433-2825
DOI: 10.1007/s10032-010-0146-0